/********************************validating IPC-based similarity measure********************************
purpose: test the validity of similarity measure
datasets:
ipc7sim_validation_citing_fixed: pairwise ipc7-based cosine similarity between important references, background references, and fake references while keep the citing patent fixed. Important references refer to X- and Y-citations, as designated by the EPO, which indicate that at least one claim in the citing patent cannot be considered novel or does not involve an inventive step. Background references refer to other types of actual citations. Fake references are constructed by matching the cited patent to another EP patent with the same application quarter, IPC 4-digit code, and grant status.
ipc7sim_validation_cited_fixed: pairwise ipc7-based cosine similarity between important references, background references, and fake references while keep the cited patent fixed. Important references refer to X- and Y-citations, as designated by the EPO, which indicate that at least one claim in the citing patent cannot be considered novel or does not involve an inventive step. Background references refer to other types of actual citations. Fake references are constructed by matching the citing patent to another EP patent with the same application quarter, IPC 4-digit code, and grant status.
*/

log using "$DATA\ipc7sim_validation_log.txt", replace

***************Table B1 Panel A:simple comparison of x/y-reference***************
*x/y-citing (fixed) with  x/y-cited, simple-cited, and matched non-cited

use "$DATA\ipc7sim_validation_citing_fixed",clear
ttest ipc7simold=ipc7sim //difference b/w important referneces and fake references
ttest ipc7simold=ipc7simcited //difference b/w important referneces and background references
ttest ipc7simold=ipc7sim if ipc7simcited!=. //difference b/w important referneces and fake references (conditional on having at least one background reference)
ttest ipc7simcited=ipc7sim if ipc7simcited!=. //difference b/w background references and fake references 


***************Table B1 Panel B:simple comparison of x/y-reference***************
*x/y-cited (fixed) with x/y-citing, simple-citing, and matched non-citing

use "$DATA\ipc7sim_validation_cited_fixed",clear
ttest ipc7simold=ipc7sim //difference b/w important referneces and fake references
ttest ipc7simold=ipc7simciting //difference b/w important referneces and background references
ttest ipc7simold=ipc7sim if ipc7simciting!=. //difference b/w important referneces and fake references (conditional on having at least one background reference)
ttest ipc7simciting=ipc7sim if ipc7simciting!=. //difference b/w background references and fake references 
log close